#### CS202: COMPUTER ORGANIZATION

#### Lecture 5

Instruction Set Architecture(3)

## Today's Topic

- Recap:
  - More control instructions
  - Procedure call
- Today's topic:
  - MIPS addressing
  - Translating and starting a program
  - a C sort example
  - Other popular ISAs

## MIPS Addressing

- Addressing: how the instructions identify the operands of the instruction.
- MIPS Addressing mode:

```
Immediate addressing addi $s0, $s1, 5
```

## **Immediate Addressing**

- For instructions including immediate
  - E.g. addi, subi, andi, ori



- Most constants are small
  - 16-bit immediate is sufficient
- For the occasional 32-bit constant: y=x+4000000
   4000000<sub>dec</sub>= 11 1101 0000 1001 0000 0000<sub>bin</sub>

#### lui rt, constant

- Copies 16-bit constant to left 16 bits of rt
- Clears right 16 bits of rt to 0

lui \$s0, 61

0000 0000 0011 1101 0000 0000 0000 0000

ori \$s0, \$s0, 2304

0000 0000 0011 1101 0000 1001 0000 0000

### Register Addressing

- Using register as the operand
- E.g. add, addi, sub, subi, lw, ...



## Base/displacement Addressing

 the operand is at the memory, whose address is the sum of a register and a constant lw/lh/lb/sw/sh/sb

e.g. lw \$s0, 4(\$s1)
 op of lw: 100011
 rs: 10001 (address of s1), rt: 10000 (address of s0), address: 100 (4)

s1: 000100000000000 0000000000000000

## Branch Addressing (PC-relative addressing)

- Branch instructions specify
  - Opcode, two registers, target address
  - e.g. beq \$s0 \$s1 label
- Most branch targets are near branch

beq \$s0 \$s1 label add \$s2 \$s3 \$s4 lable: sub \$s2 \$s3 \$s4

Forward or backward

Ins: 000100 10000 10001 0000000000000001 0x00400008: constant Address op rs rt Memory PC Word PC:

- PC-relative addressing
  - Target address = PC + 4 + constant  $\times$  4

000000001000000 0000000000000000

## Target Addressing Example

- Loop code from earlier example
  - Assume Loop at location 80000

| Loop: | s11  | \$t1, | \$s3, | 2            | 80000 | 0  | 0      | 19   | 9         | 4 | 0  |
|-------|------|-------|-------|--------------|-------|----|--------|------|-----------|---|----|
|       | add  | \$t1, | \$t1, | <b>\$</b> s6 | 80004 | 0  | 9      | 22   | 9         | 0 | 32 |
|       | ٦w   | \$t0, | 0(\$t | 1)           | 80008 | 35 | . 9    | 8    |           | 0 |    |
|       | bne  | \$t0, | \$s5, | Exit         | 80012 | 5  | 8.     | 21   | ****      | 2 |    |
|       | addi | \$s3, | \$s3, | 1            | 80016 | 8  | 19     | 19   | A R R R C | 1 |    |
|       | j    | Loop  |       |              | 80020 | 2  | ****** | **** | 20000     |   |    |
| Exit: |      |       |       |              | 80024 |    |        |      |           |   |    |

## Jump Addressing (Pseudo-direct addressing)

 Jump (j and jal) targets could be anywhere in text segment

0x00400010: j label

Encode full address in instruction



(Pseudo)Direct jump addressing

PC:

Target address = PC31...28 : (address × 4)

### Branching Far Away

- If branch target is too far to encode with 16-bit offset, assembler rewrites the code
- Example

```
beq $s0,$s1, L1

↓

bne $s0,$s1, L2

j L1

L2: ...
```

## Addressing Mode Summary



## There is no "direct addressing"

```
data
        str: .asciiz "the answer = "
. text
main:
        li $v0, 4
        la $a0, str
        syscall
        1b $t0, ($a0)
        li $v0, 10
        syscall
```

We can only use lw/sw to visit the memory

ori/lui is immediate addressing Ib is base/displacement addressing

| Address   | Code         | Basic                    |     |                 | N    | 1 |
|-----------|--------------|--------------------------|-----|-----------------|------|---|
| 0x0040000 | 0x24020004   | addiu \$2,\$0,0x00000004 | 5:  | li \$v0, 4      | \$   |   |
| 0x0040000 | 4 0x3c011001 | lui \$1,0x00001001       | 6:  | la \$a0, str    | \$at | Г |
| 0x0040000 | 8 0x34240000 | ori \$4,\$1,0x00000000   |     |                 | \$v0 |   |
| 0x0040000 | c 0x0000000c | syscall                  | 7:  | syscall         | \$v1 |   |
| 0x0040001 | 0 0x80880000 | 1b \$8,0x00000000(\$4)   | 8:  | lb \$t0, (\$a0) |      |   |
| 0x0040001 | 4 0x2402000a | addiu \$2,\$0,0x0000000a | 10: | li \$v0,10      | \$a0 |   |
| 0x0040001 | 8 0x0000000c | syscal1                  | 11: | syscall         |      | - |

| N    | Nu |
|------|----|
| \$   | 0  |
| \$at | 1  |
| \$v0 | 2  |
| \$v1 | 3  |
| \$a0 | 4  |
|      |    |

## **Decoding Machine Language**

What is the assembly language of the following machine instruction?

#### 0x00af8020

- Hex to bin: 0000 0000 1010 1111 1000 0000 0010 0000
   op rs rt rd shamt funct
   000000 00101 01111 10000 00000 100000
- Get the instruction: add \$s0,\$a1,\$t7

# **Decoding Machine Language**

|        |                     |            | 0                     | pp(31:26)                         |                       |                          |        |                         |
|--------|---------------------|------------|-----------------------|-----------------------------------|-----------------------|--------------------------|--------|-------------------------|
| 28–26  | 0(000)              | 1(001)     | 2(010)                | 3(011)                            | 4(100)                | 5(101)                   | 6(110) | 7(111)                  |
| 31–29  |                     |            |                       |                                   |                       |                          |        |                         |
| 0(000) | R-format            | B1tz/gez   | jump                  | jump & link                       | branch eq             | branch<br>ne             | blez   | bgtz                    |
| 1(001) | add<br>immediate    | addiu      | set less<br>than imm. | set less<br>than imm.<br>unsigned | andi                  | ori                      | xori   | load upper<br>immediate |
| 2(010) | TLB                 | F1Pt       |                       |                                   |                       |                          |        |                         |
| 3(011) |                     |            |                       |                                   |                       |                          |        |                         |
| 4(100) | load byte           | load half  | 1w1                   | load word                         | load byte<br>unsigned | load<br>half<br>unsigned | lwr    |                         |
| 5(101) | store byte          | store half | swl                   | store word                        |                       |                          | swr    |                         |
| 6(110) | load linked<br>word | 1wc1       |                       |                                   |                       |                          |        |                         |
| 7(111) | store cond.<br>word | swc1       |                       |                                   |                       |                          |        |                         |

# Decoding Machine Language

|        |                       | op(    | 3 <b>1:26)=0000</b> 0  | 0 (R-format),        | funct(5:0) |        |        |              |
|--------|-----------------------|--------|------------------------|----------------------|------------|--------|--------|--------------|
| 2–0    | 0(000)                | 1(001) | 2(010)                 | 3(011)               | 4(100)     | 5(101) | 6(110) | 7(111)       |
| 5–3    |                       |        |                        |                      |            |        |        |              |
| 0(000) | shift left<br>logical |        | shift right<br>logical | sra                  | sllv       |        | srlv   | srav         |
| 1(001) | jump register         | jalr   |                        |                      | syscall    | break  |        |              |
| 2(010) | mfhi                  | mthi   | mflo                   | mtlo                 |            |        |        |              |
| 3(011) | mult                  | multu  | div                    | divu                 |            |        |        |              |
| 4(100) | add                   | addu   | subtract               | subu                 | and        | or     | xor    | not or (nor) |
| 5(101) |                       |        | set l.t.               | set l.t.<br>unsigned |            |        |        |              |
| 6(110) |                       |        |                        |                      |            |        |        |              |
| 7(111) |                       |        |                        |                      |            |        |        |              |

#### **MIPS Instruction Formats**

| Name       |        |        | Fie    | lds               | Comments |                         |                                        |
|------------|--------|--------|--------|-------------------|----------|-------------------------|----------------------------------------|
| Field size | 6 bits | 5 bits | 5 bits | 5 bits            | 5 bits   | 6 bits                  | All MIPS instructions are 32 bits long |
| R-format   | ор     | rs     | rt     | rd shamt funct    |          | funct                   | Arithmetic instruction format          |
| I-format   | ор     | rs     | rt     | address/immediate |          |                         | Transfer, branch, imm. format          |
| J-format   | ор     |        | ta     | rget addres       | SS       | Jump instruction format |                                        |

## Starting a C Program



#### Role of Assembler

- Convert pseudo-instructions into actual hardware instructions – pseudo-instrs make it easier to program in assembly – examples: "move", "blt", 32-bit immediate operands, etc.
- Convert assembly instrs into machine instrs a separate object file (x.o) is created for each C file (x.c) – compute the actual values for instruction labels – maintain info on external references and debugging information

- Stitches different object files into a single executable
  - patch internal and external references
  - determine addresses of data and instruction labels
  - organize code and data modules in memory
- Some libraries (DLLs) are dynamically linked the executable points to dummy routines – these dummy routines call the dynamic linker-loader so they can update the executable to jump to the correct routine

#### Object file 1:

| Procedure A  ze                                             |            |
|-------------------------------------------------------------|------------|
| ize 20 <sub>hex</sub> ss Instruction lw \$a0, 0(\$gp) jal 0 |            |
| ize 20 <sub>hex</sub> ss Instruction lw \$a0, 0(\$gp) jal 0 |            |
| lw \$a0, 0(\$gp)                                            |            |
| jal 0<br>                                                   |            |
|                                                             |            |
|                                                             |            |
| / V )                                                       |            |
| (X)                                                         |            |
|                                                             |            |
| ss Instruction type                                         | Dependency |
| 1 w                                                         | X          |
| jal                                                         | В          |
| l Address                                                   |            |
| _                                                           |            |
| _                                                           |            |
| -                                                           | jal        |

#### Object file 2:

| Object file header     |           |                    |            |
|------------------------|-----------|--------------------|------------|
|                        | Name      | Procedure B        |            |
|                        | Text size | 200 <sub>hex</sub> |            |
|                        | Data size | 30 <sub>hex</sub>  |            |
| Text segment           | Address   | Instruction        |            |
|                        | 0         | sw \$a1, 0(\$gp)   |            |
|                        | 4         | jal O              |            |
|                        |           |                    |            |
| Data segment           | 0         | (Y)                |            |
|                        |           |                    |            |
| Relocation information | Address   | Instruction type   | Dependency |
|                        | 0         | SW                 | Υ          |
|                        | 4         | jal                | Α          |
| Symbol table           | Label     | Address            |            |
|                        | Υ         | _                  |            |
|                        | Α         | _                  |            |

#### Executable file: $p+8000_{hex}=10008000_{hex}+ffff8000_{hex}=10000000_{hex}$

| Executable file header |                          |                                     |
|------------------------|--------------------------|-------------------------------------|
|                        | Text size                | 300 <sub>hex</sub>                  |
|                        | Data size                | 50 <sub>hex</sub>                   |
| Text segment           | Address                  | Instruction                         |
| A                      | 0040 0000 <sub>hex</sub> | lw \$a0, 8000 <sub>hex</sub> (\$gp) |
|                        | 0040 0004 <sub>hex</sub> | jal 40 0100 <sub>hex</sub>          |
|                        |                          |                                     |
| В                      | 0040 0100 <sub>hex</sub> | sw \$a1, 8020 <sub>hex</sub> (\$gp) |
|                        | 0040 0104 <sub>hex</sub> | jal 40 0000 <sub>hex</sub>          |
|                        | •••                      |                                     |
| Data segment           | Address                  |                                     |
| X                      | 1000 0000 <sub>hex</sub> | (X)                                 |
|                        |                          |                                     |
| Υ                      | 1000 0020 <sub>hex</sub> | (Y)                                 |
|                        |                          |                                     |

## **Starting Java Applications**



### Full Example – Sort in C (pg. 133)

```
void sort (int v[], int n)
{
    int i, j;
    for (i=0; i<n; i+=1) {
        for (j=i-1; j>=0 && v[j] > v[j+1]; j-=1) {
            swap (v,j);
        }
    }
}
```

```
void swap (int v[], int k)
{
   int temp;
   temp = v[k];
   v[k] = v[k+1];
   v[k+1] = temp;
}
```

- Allocate registers to program variables
- Produce code for the program body
- Preserve registers across procedure invocations

### The swap Procedure

 Register allocation: \$a0 and \$a1 for the two arguments, \$t0 for the temp variable – no need for saves and restores as we're not using \$s0-\$s7 and this is a leaf procedure (won't need to re-use \$a0 and \$a1)

```
swap: sll $t1, $a1, 2
add $t1, $a0, $t1
lw $t0, 0($t1)
lw $t2, 4($t1)
sw $t2, 0($t1)
sw $t0, 4($t1)
jr $ra
```

```
void swap (int v[], int k)
{
   int temp;
   temp = v[k];
   v[k] = v[k+1];
   v[k+1] = temp;
}
```

#### The sort Procedure

- Register allocation: arguments v and n use \$a0 and \$a1, i and j use \$s0 and \$s1; must save \$a0 and \$a1 before calling the leaf procedure
- The outer for loop looks like this: (note the use of pseudo-instrs)

```
move $s0, $zero # initialize the loop
loopbody1: bge $s0, $a1, exit1 # will eventually use slt and beq
... body of inner loop ...
addi $s0, $s0, 1
j loopbody1
```

exit1:

```
for (i=0; i<n; i+=1) {
  for (j=i-1; j>=0 && v[j] > v[j+1]; j-=1) {
     swap (v,j);
  }
}
```

#### The sort Procedure

The inner for loop looks like this:

```
$$1, $$0, -1 # initialize the loop
           addi
                   $$1, $zero, exit2 # will eventually use slt and beq
loopbody2: blt
                  $t1, $s1, 2
           sll
           add $t2, $a0, $t1
                  $t3, 0($t2)
           lw
                  $t4, 4($t2)
           lw
                   $t3, $t4, exit2
           bgt
            ... body of inner loop ...
                   $s1, $s1, -1
           addi
                   loopbody2
                                  for (i=0; i< n; i+=1) {
exit2:
                                    for (j=i-1; j>=0 \&\& v[j] > v[j+1]; j-=1) {
```

swap (v,j);

27

#### Saves and Restores

- Since we repeatedly call "swap" with \$a0 and \$a1, we begin "sort" by copying its arguments into \$s2 and \$s3 – must update the rest of the code in "sort" to use \$s2 and \$s3 instead of \$a0 and \$a1
- Must save \$ra at the start of "sort" because it will get over-written when we call "swap"
- Must also save \$s0-\$s3 so we don't overwrite something that belongs to the procedure that called "sort"

#### Saves and Restores

```
$sp, $sp, -20
sort:
      addi
            $ra, 16($sp)
      SW
             $s3, 12($sp)
      SW
                             9 lines of C code → 35 lines of assembly
            $s2, 8($sp)
      SW
            $s1, 4($sp)
      SW
             $s0, 0($sp)
      SW
            $s2, $a0
      move
            $s3, $a1
      move
             $a0, $s2
                          # the inner loop body starts here
      move
             $a1, $s1
      move
      jal
             swap
             $s0, 0($sp)
exit1: lw
             $sp, $sp, 20
     addi
                                                              29
              $ra
     jr
```

# Effect of Language and Algorithm



#### **Lessons Learnt**

- Instruction count and CPI are not good performance indicators in isolation
- Compiler optimizations are sensitive to the algorithm
- Java/JIT compiled code is significantly faster than JVM interpreted
  - Comparable to optimized C in some cases
- Nothing can fix a dumb algorithm!

#### Other ISAs

- RISC-V
- ARM
- x86

#### Mainstream ISAs

#### Full RISC-V Architecture:

https://digitalassets.lib.berkeley.edu/techre ports/ucb/text/EECS-2016-1.pdf



**ARM** 



x86

Designer Intel, AMD

Bits 16-bit, 32-bit and 64-bit

Introduced 1978 (16-bit), 1985 (32-bit), 2003

(64-bit)

Design CISC

Type Register-memory

**Encoding** Variable (1 to 15 bytes)

**Endianness** Little

Core i3. i5. i7...

#### ARM architectures

Designer **ARM Holdings** 32-bit, 64-bit

Introduced 1985; 31 years ago

Design RISC

Bits

Type Register-Register

AArch64/A64 and AArch32/A32 Encoding

> use 32-bit instructions, T32 (Thumb-2) uses mixed 16- and 32-bit instructions. ARMv7 userspace compatibility[1]

Endianness Bi (little as default)

Smartphone-like devices (iPhone, Android), Raspberry Pi. Embedded systems Apple M series

#### **RISC-V**

University of California, Designer

Berkeley

32, 64, 128 Bits

Introduced 2010

Version 2.2

Design RISC

Type Load-store

Encoding Variable

Branching Compare-and-branch

**Endianness** Little

Versatile and open-source Relatively new, designed for cloud computing, embedded systems, academic use

## RISC-V among ISAs



## Rapid RISC-V growth led by industrial

- Semico Research predicts the market will consume 62.4 billion RISC-V CPU cores by 2025, a 146.2% CAGR 2018-2025. The industrial sector to lead with 16.7 billion cores.
- Custom ICs Based on RISC-V Will Enable Cost-Effective IoT Product Differentiation



#### RISC-V foundation



## MIPS vs. RISC-V

#### Similar basic set of instructions

|                      | MIPS32               | RISC-V (RV32)      |
|----------------------|----------------------|--------------------|
| Date announced       | 1985                 | 2010               |
| License              | Proprietary          | Open-Source        |
| Instruction size     | 32 bits              | 32 bits            |
| Endianness           | Big-endian           | Little-endian      |
| Addressing modes     | 5                    | 4                  |
| Registers            | 32 	imes 32-bit      | 32 	imes 32-bit    |
| Pipeline Stages      | 5 stages             | 5 stages           |
| ISA type             | Load-store           | Load-store         |
| Conditional branches | slt, sltu + beq, bnq | +blt,bge,bltu,bgeu |

## RISC-V Registers

- x0: the constant value 0
- x1: return address
- x2: stack pointer
- x3: global pointer
- x4: thread pointer
- x5 x7, x28 x31: temporaries
- x8: frame pointer
- x9, x18 x27: saved registers
- x10 x11: function arguments/results
- x12 x17: function arguments

| Name    | Register<br>number | Usage                          | Preserved on call? |
|---------|--------------------|--------------------------------|--------------------|
| x0      | 0                  | The constant value 0           | n.a.               |
| x1 (ra) | 1                  | Return address (link register) | yes                |
| x2 (sp) | 2                  | Stack pointer                  | yes                |
| x3 (gp) | 3                  | Global pointer                 | yes                |
| x4 (tp) | 4                  | Thread pointer                 | yes                |
| x5-x7   | 5–7                | Temporaries                    | no                 |
| x8-x9   | 8–9                | Saved                          | yes                |
| x10-x17 | 10–17              | Arguments/results              | no                 |
| x18-x27 | 18–27              | Saved                          | yes                |
| x28-x31 | 28–31              | Temporaries                    | no                 |

# RISC-V Assembly

| RISC-V assembly language |                            |                   |                           |                                               |  |  |  |  |
|--------------------------|----------------------------|-------------------|---------------------------|-----------------------------------------------|--|--|--|--|
| Category                 | Instruction                | Example           | Meaning                   | Comments                                      |  |  |  |  |
|                          | Add                        | add x5, x6, x7    | x5 - x6 + x7              | Three register operands                       |  |  |  |  |
| Arithmetic               | Subtract                   | sub x5, x6, x7    | x5 - x6 - x7              | Three register operands                       |  |  |  |  |
|                          | Add immediate              | addi x5, x6, 20   | x5 - x6 + 20              | Used to add constants                         |  |  |  |  |
|                          | Load doubleword            | 1d x5,40(x6)      | x5 - Memory[x6 + 40]      | Doubleword from memory to register            |  |  |  |  |
|                          | Store doubleword           | sd x5.40(x6)      | Memory[x6 + 40] - x5      | Doubleword from register to memory            |  |  |  |  |
|                          | Load word                  | lw x5,40(x6)      | x5 - Memory[x6 + 40]      | Word from memory to register                  |  |  |  |  |
|                          | Load word, unsigned        | 1wu x5,40(x6)     | x5 - Memory[x6 + 40]      | Unsigned word from memory to registe          |  |  |  |  |
|                          | Store word                 | sw x5,40(x6)      | Memory[x6 + 40] - x5      | Word from register to memory                  |  |  |  |  |
|                          | Load halfword              | 1h x5,40(x6)      | x5 - Memory[x6 + 40]      | Halfword from memory to register              |  |  |  |  |
| Data transfer            | Load halfword,<br>unsigned | 1hu x5,40(x6)     | x5 - Memory[x6 + 40]      | Unsigned halfword from memory<br>to register  |  |  |  |  |
|                          | Store halfword             | sh x5,40(x6)      | Memory[x6 + 40] - x5      | Halfword from register to memory              |  |  |  |  |
|                          | Load byte                  | 1b x5,40(x6)      | x5 - Memory[x6 + 40]      | Byte from memory to register                  |  |  |  |  |
|                          | Load byte, unsigned        | 1bu x5,40(x6)     | x5 - Memory[x6 + 40]      | Byte halfword from memory to register         |  |  |  |  |
|                          | Store byte                 | sb x5,40(x6)      | Memory[x6 + 40] - x5      | Byte from register to memory                  |  |  |  |  |
|                          | Load reserved              | 1r.d x5, (x6)     | x5 - Memory[x6]           | Load; 1st half of atomic swap                 |  |  |  |  |
|                          | Store conditional          | sc.d x7, x5, (x6) | Memory[x6] - x5; x7 - 0/1 | Store; 2nd half of atomic swap                |  |  |  |  |
|                          | Load upper<br>immediate    | lui x5. 0x12345   | x5 - 0x12345000           | Loads 20-bit constant shifted left<br>12 bits |  |  |  |  |
|                          | And                        | and x5, x6, x7    | x5 - x6 & x7              | Three reg. operands; bit-by-bit AND           |  |  |  |  |
|                          | Inclusive or               | or x5, x6, x8     | x5 - x6   x8              | Three reg. operands; bit-by-bit OR            |  |  |  |  |
| t mateur                 | Exclusive or               | xor x5, x6, x9    | x5 - x6 ^ x9              | Three reg. operands; bit-by-bit XOR           |  |  |  |  |
| Logical                  | And immediate              | andi x5, x6, 20   | x5 - x6 & 20              | Bit-by-bit AND reg. with constant             |  |  |  |  |
|                          | Inclusive or immediate     | ori x5. x6. 20    | x5 - x6   20              | Bit-by-bit OR reg. with consta                |  |  |  |  |
|                          | Exclusive or immediate     | xori x5, x6, 20   | x5 - x6 ^ 20              | Bit-by-bit XOR reg. with constant             |  |  |  |  |

# RISC-V Assembly

| RISC-V assembly language |                                  |                  |                               |                                                  |  |  |  |  |
|--------------------------|----------------------------------|------------------|-------------------------------|--------------------------------------------------|--|--|--|--|
| Category                 | Instruction                      | Example          | Meaning                       | Comments                                         |  |  |  |  |
|                          | Shift left logical               | s11 x5, x6, x7   | x5 - x6 << x7                 | Shift left by register                           |  |  |  |  |
|                          | Shift right logical              | sr1 x5, x6, x7   | x5 - x6 >> x7                 | Shift right by register                          |  |  |  |  |
|                          | Shift right arithmetic           | sra x5. x6. x7   | x5 - x6 >> x7                 | Arithmetic shift right by register               |  |  |  |  |
| Shift                    | Shift left logical<br>immediate  | slli x5, x6, 3   | x5 - x6 << 3                  | Shift left by immediate                          |  |  |  |  |
| 1001H65-21               | Shift right logical immediate    | srli x5, x6, 3   | x5 - x6 >> 3                  | Shift right by immediate                         |  |  |  |  |
|                          | Shift right arithmetic immediate | srai x5, x6, 3   | x5 - x6 >> 3                  | Arithmetic shift right by immediate              |  |  |  |  |
|                          | Branch if equal                  | beq x5, x6, 100  | if (x5 == x6) go to PC+100    | PC-relative branch if registers equal            |  |  |  |  |
|                          | Branch if not equal              | bne x5, x6, 100  | if (x5 != x6) go to PC+100    | PC-relative branch if registers not equal        |  |  |  |  |
|                          | Branch if less than              | blt x5, x6, 100  | if (x5 < x6) go to PC+100     | PC-relative branch if registers less             |  |  |  |  |
| Conditional              | Branch if greater or equal       | bge x5, x6, 100  | if $(x5 \ge x6)$ go to PC+100 | PC-relative branch if registers greater or equal |  |  |  |  |
| branch                   | Branch if less, unsigned         | bltu x5, x6, 100 | if (x5 < x6) go to PC+100     | PC-relative branch if registers less             |  |  |  |  |
|                          | Branch if greatr/eq,<br>unsigned | bgeu x5, x6, 100 | if $(x5 \ge x6)$ go to PC+100 | PC-relative branch if registers greater or equal |  |  |  |  |
| Unconditional            | Jump and link                    | jal x1, 100      | x1 = PC+4; go to PC+100       | PC-relative procedure call                       |  |  |  |  |
| branch                   | Jump and link register           | jalr x1, 100(x5) | x1 = PC+4; go to $x5+100$     | Procedure return; indirect call                  |  |  |  |  |

## RISC-V: 6 instruction formats

- R-Format: instructions using 3 register inputs
  - add, xor arithmetic/logical ops
- I-Format: instructions with immediates, loads
  - addi, lw
- S-Format: store instructions: sw, sb
- SB-Format: branch instructions: beq, bge
- U-Format: instructions with upper immediates
  - lui upper immediate is 20-bits
- UJ-Format: the jump instruction: jal

| Name         | Field                       |        |        |        |               |        | Comments                      |
|--------------|-----------------------------|--------|--------|--------|---------------|--------|-------------------------------|
| (Field Size) | 7 bits                      | 5 bits | 5 bits | 3 bits | 5 bits        | 7 bits |                               |
| R-type       | funct7                      | rs2    | rs1    | funct3 | rd            | opcode | Arithmetic instruction format |
| I-type       | immediate[11:0]             |        | rs1    | funct3 | rd            | opcode | Loads & immediate arithmetic  |
| S-type       | immed[11:5]                 | rs2    | rs1    | funct3 | immed[4:0]    | opcode | Stores                        |
| SB-type      | immed[12,10:5]              | rs2    | rs1    | funct3 | immed[4:1,11] | opcode | Conditional branch format     |
| UJ-type      | immediate[20,10:1,11,19:12] |        |        |        | rd            | opcode | Unconditional jump format     |
| U-type       | immediate[31:12]            |        |        |        | rd            | opcode | Upper immediate format        |

## RISC-V R-format Instructions

| funct7 | rs2    | rs1    | funct3 | rd     | opcode |
|--------|--------|--------|--------|--------|--------|
| 7 bits | 5 bits | 5 bits | 3 bits | 5 bits | 7 bits |

- Instruction fields
  - opcode: operation code
  - rd: destination register number
  - funct3: 3-bit function code (additional opcode)
  - rs1: the first source register number
  - rs2: the second source register number
  - funct7: 7-bit function code (additional opcode)
- Example: add x9,x20,x21



0000 0001 0101 1010 0000 0100 1011 0011<sub>two</sub> =  $015A04B3_{16}$ 

## RISC-V I-format Instructions



- Immediate arithmetic and load instructions
  - rs1: source or base address register number
  - immediate: constant operand, or offset added to base address
    - 2s-complement, sign extended

Example addi x15, x1, -40

|                    | - 1              |     |                   |         |  |
|--------------------|------------------|-----|-------------------|---------|--|
| -40 <sub>dec</sub> | 1 <sub>dec</sub> | 0   | 15 <sub>dec</sub> | 0x13    |  |
| 111111011000       | 00001            | 000 | 01111             | 0010011 |  |
| 111111011000       | 00001            | 000 | UIIII             | 0010011 |  |

## RISC-V S-format Instructions



- Different immediate format for store instructions
  - rs1: base address register number
  - rs2: source operand register number
  - immediate: offset added to base address
    - Split so that rs1 and rs2 fields always in the same place



# RISC-V Addressing Summary



## MIPS & RISC-V Instruction Formats

| Register-re | giste | r            |        |        |    |            |    |    |      |       |    |             |    |           |   |
|-------------|-------|--------------|--------|--------|----|------------|----|----|------|-------|----|-------------|----|-----------|---|
|             | 31    |              | 25 2   | 24     | 20 | 19         |    | 15 | 14   | 12    | 11 | 7           | 6  |           | 0 |
| RISC-V      |       | funct7(7)    |        | rs2(5) |    | rs1(5)     |    |    | func | t3(3) |    | rd(5)       |    | opcode(7) |   |
|             | 31    | 26           | 25     | 21     | 20 | 1          | 16 | 15 |      |       | 11 | 10          | 6  | 5         | 0 |
| MIPS        |       | Op(6)        |        | Rs1(5) |    | Rs2(5)     |    |    | Rd   | l(5)  |    | Const(5)    |    | Opx(6)    |   |
|             |       |              |        |        |    |            |    |    |      |       |    |             |    |           |   |
| Load        |       |              |        |        |    |            |    |    |      |       |    |             |    |           |   |
|             | 31    |              |        |        | 20 | 19         |    | 15 | 14   | 12    | 11 | 7           | 6  |           | 0 |
| RISC-V      |       | immedi       | ate(12 | •      |    | rs1(5)     |    |    | func | t3(3) |    | rd(5)       |    | opcode(7) |   |
|             | 31    | 26           | 25     | 21     | 20 |            | 16 | 15 |      |       |    |             |    |           | 0 |
| MIPS        |       | Op(6)        |        | Rs1(5) |    | Rs2(5)     |    |    |      |       |    | Const(16    | 3) |           |   |
|             |       |              |        |        |    |            |    |    |      |       |    |             |    |           |   |
| Store       |       |              |        |        |    |            |    |    |      |       |    |             |    |           |   |
|             | 31    |              | 25 2   |        | 20 | 19         |    | 15 | 14   | 12    |    | 7           | 6  |           | 0 |
| RISC-V      |       | immediate(7) |        | rs2(5) |    | rs1(5)     |    |    | func | t3(3) | in | nmediate(5) |    | opcode(7) |   |
|             | 31    | 26           | 25     | 21     | 20 | 1          | 16 | 15 |      |       |    |             |    |           | 0 |
| MIPS        |       | Op(6)        |        | Rs1(5) |    | Rs2(5)     |    |    |      |       |    | Const(16    | 3) |           |   |
|             |       |              |        |        |    |            |    |    |      |       |    |             |    |           |   |
| Branch      |       |              |        |        |    |            |    |    |      |       |    |             |    |           |   |
|             | 31    |              | 25 2   | 24     | 20 | 19         |    | 15 | 14   | 12    | 11 | 7           | 6  |           | 0 |
| RISC-V      |       | immediate(7) |        | rs2(5) |    | rs1(5)     |    |    | func | t3(3) | in | nmediate(5) |    | opcode(7) | ] |
|             | 31    | 26           | 25     | 21     | 20 | 1          | 16 | 15 |      |       |    |             |    |           | 0 |
| MIPS        |       | Op(6)        |        | Rs1(5) |    | Opx/Rs2(5) |    |    |      |       |    | Const(16    | 3) |           |   |
|             |       |              |        |        |    |            |    |    |      |       |    |             |    |           |   |

## **ARM Market share**

## Markets for ARM in 2012

|            | Devices Shipped<br>(Million of Units) | 2012<br>Devices | Chips/<br>Device | TAM 2012<br>Chips | 2012<br>ARM | 2012<br>Share |
|------------|---------------------------------------|-----------------|------------------|-------------------|-------------|---------------|
|            | Smart Phone                           | 730             | 3-5              | 2,500             | 2,200       | 90%           |
| e          | Feature Phone                         | 460             | 2-3              | 1,200             | 1,100       | 95%           |
| Mobile     | Low End Voice                         | 730             | 1-2              | 730               | 700         | 95%           |
| ž          | Portable Media Players                | 130             | 1-3              | 250               | 220         | 90%           |
|            | Mobile Computing* (apps only)         | 400             | 1                | 400               | 160         | 40%           |
| me         | Digital Camera                        | 150             | 1-2              | 230               | 180         | 80%           |
| Home       | Digital TV & Set-top-box              | 420             | 1-2              | 640               | 290         | 45%           |
| 100        | Desktop PCs & Servers (apps)          | 200             | 1                | 200               | -           | 0%            |
| Enterprise | Networking                            | 1,200           | 1-2              | 1,300             | 420         | 35%           |
| iter       | Printers                              | 120             | 1                | 120               | 85          | 70%           |
| Er         | Hard Disk & Solid State Drives        | 700             | 1                | 700               | 620         | 90%           |
| Ď          | Automotive                            | 2,600           | 1                | 2,600             | 210         | 8%            |
| dded       | Smart Card                            | 6,000           | 1                | 6,000             | 710         | 13%           |
| Embe       | Microcontrollers                      | 8,700           | 1                | 8,700             | 1,500       | 18%           |
| 듑          | Others **                             | 2,000           | 1                | 2,000             | 300         | 15%           |
|            | Total                                 | 25,500          | Donk             | 27,000            | 8,700       | 32%           |

| Year | Market<br>Share |
|------|-----------------|
| 2007 | 17%             |
| 2008 | 20%             |
| 2009 | 22%             |
| 2010 | 25%             |
| 2011 | 29%             |
| 2012 | 32%             |

Source: Gartner, IDC, SIA, and ARM estimates

# **ARM Applications**



## **ARM CPU Series**



## **ARM & MIPS Similarities**

- ARM: the most popular embedded core
- Similar basic set of instructions to MIPS

|                       | ARM              | MIPS             |
|-----------------------|------------------|------------------|
| Date announced        | 1985             | 1985             |
| Instruction size      | 32 bits          | 32 bits          |
| Address space         | 32-bit flat      | 32-bit flat      |
| Data alignment        | Aligned          | Aligned          |
| Data addressing modes | 9                | 3                |
| Registers             | 15 × 32-bit      | 31 × 32-bit      |
| Input/output          | Memory<br>mapped | Memory<br>mapped |

## ARM v8 Instructions

- In moving to 64-bit, ARM did a complete overhaul
- ARM v8 resembles MIPS
  - Changes from v7:
    - No conditional execution field
    - Immediate field is 12-bit constant
    - Dropped load/store multiple
    - PC is no longer a GPR
    - GPR set expanded to 32
    - Addressing modes work for all word sizes
    - Divide instruction
    - Branch if equal/branch if not equal instructions

## The Intel x86 ISA

- Evolution with backward compatibility
  - 8080 (1974): 8-bit microprocessor
    - Accumulator, plus 3 index-register pairs
  - 8086 (1978): 16-bit extension to 8080
    - Complex instruction set (CISC)
  - 8087 (1980): floating-point coprocessor
    - Adds FP instructions and register stack
  - 80286 (1982): 24-bit addresses, MMU
    - Segmented memory mapping and protection
  - 80386 (1985): 32-bit extension (now IA-32)
    - Additional addressing modes and operations
    - Paged memory mapping as well as segments

#### The Intel x86 ISA

- Further evolution...
  - i486 (1989): pipelined, on-chip caches and FPU
    - Compatible competitors: AMD, Cyrix, ...
  - Pentium (1993): superscalar, 64-bit datapath
    - Later versions added MMX (Multi-Media eXtension) instructions
    - The infamous FDIV bug
  - Pentium Pro (1995), Pentium II (1997)
    - New microarchitecture (see Colwell, The Pentium Chronicles)
  - Pentium III (1999)
    - Added SSE (Streaming SIMD Extensions) and associated registers
  - Pentium 4 (2001)
    - New microarchitecture
    - Added SSE2 instructions

#### The Intel x86 ISA

- And further...
  - AMD64 (2003): extended architecture to 64 bits
  - EM64T Extended Memory 64 Technology (2004)
    - AMD64 adopted by Intel (with refinements)
    - Added SSE3 instructions
  - Intel Core (2006)
    - Added SSE4 instructions, virtual machine support
  - AMD64 (announced 2007): SSE5 instructions
    - Intel declined to follow, instead
  - Advanced Vector Extension (announced 2008)
    - Longer SSE registers, more instructions
- If Intel didn't extend with compatibility, its competitors would!
  - Technical elegance ≠ market success

# Basic x86 Registers



# **Concluding Remarks**

- Design principles
  - 1. Simplicity favors regularity
  - 2. Smaller is faster
  - 3. Make the common case fast
  - 4. Good design demands good compromises
- Layers of software/hardware
  - Compiler, assembler, hardware
- MIPS: typical of RISC ISAs
  - c.f. x86